Replace `stop` with `return` and provide and `bmi_failure` flags for EnergyModule error #108

SnowHydrology · 2024-04-30T19:30:18Z

This PR replaces the stop statement with return in EnergyModule.f90. This will allow the NextGen framework to provide basin and timestep information when BMI_FAILURE is reported.

Changes:

bmi/bmi_noahowp.f90: return BMI_FAILURE to stop model in framework if error flag doesn't equal 0
src/DomainType.f90: add error flag to domain type so value is accessible
src/EnergyModule.f90: Replace stop with return and assign error flag value

PhilMiller

It looks like some changes may need to be made up the call stack from EnergyMain to avoid weird side effects and consequences.

As I read it, we're expecting a stack like this where the error occurs:

noahowp_update_until
noahowp_update
advance_in_time
solve_noahowp
EnergyMain

When we hit the early return from EnergyMain, there's still the following call to WaterMain in solve_noahowp that'll run, unless error checking and early returns are propagated.

drakest123 · 2024-05-07T17:15:53Z

I agree with @PhilMiller that we need to immediately propagate an error up the call stack to avoid side effects of a late return from an error. I don’t see how to avoid conditional statements that check for errors before the end of a subroutine but most of the stop statements come from initialization checks at the beginning of a model run. In that case, conditional statements would have minimal impact on execution speed so the remaining consideration is with cluttering the code with conditional statements. Based on the example in PR #108, I’ve prototyped dealing with upstream error propagation. Logging functionality is included (in log_error() subroutine) such that write statements are contained within a single subroutine. log_error() would replace handle_err() and sys_abort() subroutines. Consistent with NGEN, existing preprocessor directives would control program flow (in driver/NoahModularDriver.f90) and logging status for non-fatal messages (in log_error() . Subroutines that include an error check would need to use ErrorCheckModule. ErrorCheckModule contains three public parameters:

  integer, parameter, public :: NOM_SUCCESS = 0
  integer, parameter, public :: NOM_FAILURE = 1
  integer, parameter, public :: NOM_MESSAGE = 2

The values of NOM_SUCCESS and NOM_FAILURE are consistent with BMI_SUCCESS and BMI_FAILURE. Impacted files are:

% git status
On branch error_handling
Your branch is up to date with 'origin/error_handling'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   ../bmi/bmi_noahowp.f90
	modified:   ../driver/NoahModularDriver.f90
	modified:   EnergyModule.f90
	modified:   ErrorCheckModule.f90
	modified:   RunModule.f90

File additions/changes are:

% git diff
diff --git a/bmi/bmi_noahowp.f90 b/bmi/bmi_noahowp.f90
index d26aaef..3178acc 100644
--- a/bmi/bmi_noahowp.f90
+++ b/bmi/bmi_noahowp.f90
@@ -267,14 +267,14 @@ contains
   function noahowp_update(this) result (bmi_status)
     class (bmi_noahowp), intent(inout) :: this
     integer :: bmi_status
+    bmi_status = BMI_SUCCESS
 
     call advance_in_time(this%model)
-    if (this%model%domain%error_flag /= 0) then
+    if (this%model%domain%error_flag == BMI_FAILURE) then
       bmi_status = BMI_FAILURE
       return
-    else
-      bmi_status = BMI_SUCCESS
     end if
+
   end function noahowp_update
 
   ! Advance the model until the given time.
diff --git a/driver/NoahModularDriver.f90 b/driver/NoahModularDriver.f90
index a86ef93..f799713 100644
--- a/driver/NoahModularDriver.f90
+++ b/driver/NoahModularDriver.f90
@@ -43,6 +43,14 @@ program model_driver
   print*, "Running..."
   do while (current_time < end_time)
     status = m%update()                       ! run the model one time step
+    if (status == BMI_FAILURE) then
+#ifdef NGEN_ACTIVE
+      return status                           ! if NGEN
+#else
+      print*, "Stopping program."
+      stop
+#endif
+    end if
     status = m%get_current_time(current_time) ! update current_time
   end do
 
diff --git a/src/EnergyModule.f90 b/src/EnergyModule.f90
index 22c7577..4d0f44e 100644
--- a/src/EnergyModule.f90
+++ b/src/EnergyModule.f90
@@ -2,6 +2,7 @@ module EnergyModule
 
   use LevelsType
   use DomainType
+  use ErrorCheckModule
   use OptionsType
   use ParametersType
   use WaterType
@@ -45,6 +46,7 @@ contains
     REAL                                 :: D_RSURF  ! Reduced vapor diffusivity in soil for computing RSURF (SZ09)
     
     REAL                                 :: FIRE   !emitted IR (w/m2)
+    CHARACTER(len=256)                   :: error_string
     !---------------------------------------------------------------------
 
     ! Initialize the the fluxes from the vegetated fraction
@@ -299,13 +301,20 @@ contains
     END IF
 
     FIRE = forcing%LWDN + energy%FIRA
+    FIRE = -1     ! TESTING ERROR HANDLING
     IF(FIRE <=0.) THEN
-      domain%error_flag = 1
-      WRITE(*,*) 'emitted longwave <0; skin T may be wrong due to inconsistent'
-      WRITE(*,*) 'input of SHDFAC with LAI'
-      WRITE(*,*) domain%ILOC, domain%JLOC, 'SHDFAC=',parameters%FVEG,'parameters%VAI=',parameters%VAI,'TV=',energy%TV,'TG=',energy%TG
-      WRITE(*,*) 'LWDN=',forcing%LWDN,'energy%FIRA=',energy%FIRA,'water%SNOWH=',water%SNOWH
-      WRITE(*,*) 'Exiting ...'
+      domain%error_flag = NOM_FAILURE
+      !WRITE(*,*) 'emitted longwave <0; skin T may be wrong due to inconsistent'
+      !WRITE(*,*) 'input of SHDFAC with LAI'
+      !WRITE(*,*) domain%ILOC, domain%JLOC, 'SHDFAC=',parameters%FVEG,'parameters%VAI=',parameters%VAI,'TV=',energy%TV,'TG=',energy%TG
+      !WRITE(*,*) 'LWDN=',forcing%LWDN,'energy%FIRA=',energy%FIRA,'water%SNOWH=',water%SNOWH
+      !WRITE(*,*) 'Exiting ...'
+      write(error_string,101) 'EnergyMain: emitted longwave <0; skin T &
+       may be wrong due to inconsistent input of SHDFAC with LAI. ILOC=', &
+       domain%ILOC, ', JLOC=',domain%JLOC, ', SHDFAC=',parameters%FVEG,', &
+       parameters%VAI=',parameters%VAI,', TV=',energy%TV,', TG=',energy%TG
+101   format(A,I10,A,I10,A,F8.3,A,F8.3,A,F8.3,A,F8.3)
+      call log_message(domain%error_flag, trim(error_string))
       RETURN
     END IF
 
diff --git a/src/ErrorCheckModule.f90 b/src/ErrorCheckModule.f90
index f330696..33ef516 100644
--- a/src/ErrorCheckModule.f90
+++ b/src/ErrorCheckModule.f90
@@ -1,12 +1,15 @@
 module ErrorCheckModule
 
-  ! General error checking routins
-
+  ! General error checking routines
   implicit none
+  integer, parameter, public :: NOM_SUCCESS = 0
+  integer, parameter, public :: NOM_FAILURE = 1
+  integer, parameter, public :: NOM_MESSAGE = 2
 
   private
   public:: sys_abort
   public:: is_within_bound
+  public:: log_message
 
   interface is_within_bound
     module procedure is_within_bound_int
@@ -32,6 +35,29 @@ contains
 
   end subroutine sys_abort
 
+  subroutine log_message(err, message)
+
+    ! log information, typically an error
+
+    implicit none
+
+    integer, intent(in) :: err                  ! error code
+    character(*), intent(in) :: message         ! message
+
+    ! If error, write the error. If message, write message unless NGEN_QUIET
+    if(err==NOM_FAILURE)then
+      write(*, '(A,I2,A)') ' Error Code: ', err, ',  Message: '//trim(message)
+      call flush(6)
+    endif
+#ifndef NGEN_QUIET
+    if(err==NOM_MESSAGE)then
+      write(*, '(A,I2,A)') ' Error Code: ', err, ',  Message: '//trim(message)
+      call flush(6)
+    endif
+#endif
+
+  end subroutine log_message
+
   function is_within_bound_int(var, lower_bound, upper_bound) result(withinbound)
 
     ! check if a integer value is within specified bounds
diff --git a/src/RunModule.f90 b/src/RunModule.f90
index cb357b8..fdcb20c 100644
--- a/src/RunModule.f90
+++ b/src/RunModule.f90
@@ -18,8 +18,10 @@ module RunModule
   use EnergyModule
   use WaterModule
   use DateTimeUtilsModule
+  use ErrorCheckModule
   
   implicit none
+
   type :: noahowp_type
     type(namelist_type)   :: namelist
     type(levels_type)     :: levels
@@ -246,6 +248,9 @@ contains
     type (noahowp_type), intent (inout) :: model
 
     call solve_noahowp(model)
+    if (model%domain%error_flag == NOM_FAILURE) then
+      return
+    end if
 
     model%domain%itime    = model%domain%itime + 1 ! increment the integer time by 1
     model%domain%time_dbl = dble(model%domain%time_dbl + model%domain%dt) ! increment model time in seconds by DT
@@ -308,6 +313,9 @@ contains
     !---------------------------------------------------------------------
 
     call EnergyMain (domain, levels, options, parameters, forcing, energy, water)
+    if (domain%error_flag == NOM_FAILURE) then
+      return
+    end if
 
     !---------------------------------------------------------------------
     ! call the main water routines (canopy + snow + soil water components)

GreyREvenson · 2024-05-22T17:18:54Z

Offering my thoughts for consideration, @SnowHydrology

First, on the overall approach, the use of an error flag variable seems appropriate (e.g., domain%error_flag). I failed to find a superior solution and am not confident that one will be found (see Does exception handling exist in Fortran?). Unfortunately, it does seem like we need to add flag checks (e.g., if (domain%error_flag .eq. NOM_FAILURE) return -- the one line syntax could help decrease clutter) immediately after any call to a subroutine within which the error flag could be set. This is a lot of clutter, as @drakest123 noted, but would be necessary to satisfy @PhilMiller's stipulation that the early return be immediately propagated up the stack. I don't know how else to do it.

Second, regarding the error message, I'd advocate for saving the error message and printing when NGEN would print the time step and location information for the error (if possible). Instead, this code (to my current understanding) calls log_message (in ErrorCheckModule) to print the error message when the error is encountered.

Third, @drakest123, I'm not sure if you meant to add your commits atop those from @SnowHydrology in his error_handling branch but I'm noting that here as it caused some initial confusion on my part.

GreyREvenson · 2024-05-22T18:54:49Z

src/DomainType.f90

@@ -31,6 +32,8 @@ module DomainType
  integer             :: croptype          ! crop type
  integer             :: isltyp            ! soil type
  integer             :: IST               ! surface type 1-soil; 2-lake
+  integer             :: error_flag        ! flag for energy balance error (0 = no error, 1 = longwave < 0)


If we use ErrorCheckModule to deal with the error messages, we may want to move the error_flag flag to ErrorCheckModule instead.

That is a possible solution although I'd need to also review how domain%error_flag is used apart from how we are using it for the current set of updates. Another possibility is to bind a new error object to the model% object and dissociate error_flag from domain%.

GreyREvenson · 2024-05-22T19:26:16Z

driver/AsciiReadModule.f90


  implicit none

 contains

-  subroutine open_forcing_file(filename)
+  subroutine open_forcing_file(filename, error_flag)


Rather than explicitly passing error_flag, could you just make error_flag a public variable in ErrorCheckModule and set it in anywhere that uses ErrorCheckModule?

Along those same lines, could you make a public character string in ErrorCheckModule and write any error message to that variable when the error is encountered? Then use ErrorCheckModule in bmi_noahowp.f90 to be able to print the error message there (when the error code has already been passed up the stack)?

Yes, it would be possible to remove error_flag from some of the intermediate call signatures. That would clean up some of the call signatures at the expense of requiring programming care not to co-opt that variable for other uses.

Passing the error string up the stack would require adding an error string to the domain object (as currently constructed). I agree that would be more complete. That is Keith's call. If we do this, then it would make sense to follow through with your first suggestion about removing error_flag from some call signatures. When passing these to two variables up to BMI, we would need to either pass a single error object or pass both the error code and error string. I'd prefer to avoid binding ErrorCheckModule to BMI.

GreyREvenson · 2024-05-22T19:28:39Z

driver/AsciiReadModule.f90

-       stop ":  ERROR EXIT"
+       error_flag = NOM_FAILURE
+       write(error_string,'(A,A,A)') "AsciiReadModule.f90:open_forcing_file(): File: ",trim(filename), " does not exist. Check the forcing file specified as a command-line argument"
+       call log_message(error_flag, error_string)


See previous comment regarding writing to a possible public character variable in ErrorCheckModule instead of calling log_message to print right away.

Also, seems like kind of a bummer that we'd have to manually record where in the program the error occurs via the error message. My understanding is that this is because we're going to use return statements instead of stop statements in which case the program won't record any locational information (i.e., where in the code the error happens). Is there any way to automatically record locational information to pass along side the error code and message?

If this code is being run through the C pre-processor (as use of #ifdef NGEN_ACTIVE suggests), then we could define a macro along the lines of

#define NOAH_OWP_MODULAR_REPORT_ERROR(error_string) \ error_flag = NOM_FAILURE \ call log_error_message(error_flag, error_string, __FILE__, __LINE__)

Those __FILE__ and __LINE__ macros would capture the location of the error report, though not the function name

I'm a touch concerned that these files might not get pre-processed as expected, since that usually seems to go with a .F90 (upper-case) file extension. As I understand it, though, that's just a convention, and the build scripts could do whatever. The shift to a cmake-driven build system may change that, though.

I considered the preprocessor approach that @PhilMiller discusses. Besides the issues with this approach that Phil brings up, I don't think that using preprocessor directives for delineating error sources would be appropriate. Currently, the program has sparse reliance on preprocessor directives and they are used for major functional conditions such as whether the code is run standalone or as a module. These few preprocessor directives could easily be removed so the reliance on them is easy to decouple.

One possible intermediate improvement would be to define a string that contains the subroutine name and another string with module scope that contains the module name. Then, the error string would refer to those variables instead of them being hardwired. Given that LINE would unambiguously identify the line where the error is reported, unique error messages alternatively identify error sources although the onus is more on the programmer.

drakest123 · 2024-06-15T00:17:29Z

Updated error handling, addressing some of Grey Evenson's suggestions. Ready for review by @SnowHydrology. Changes include:

add ErrorType to contain error flag and error message in the BMI context. ErrorType is instantiated in RunModule before other objects so that it is available to be populated with error information early in the initialization process. This new file necessitated changes to several Makefiles.
removed error_flag as a variable in DomainType
removed error_flag from many subroutine call signatures, instead relying on USE ErrorCheckModule to pass error information
in RunModule, error_flag and error_string are passed from ErrorCheckModule to ErrorType. The save_error_state() method decouples the model code from BMI.
added moduleName as a private constant string that contains the name of the module that produced an error
added subroutineName as a local constant string that contains the name of the subroutine that produced an error
note that ErrorCheckModule:log_message() contains error_flag and error_string even though these variables are implicitly passed as members of the module. Explicitly passing these variables allows for passing alternate values in a context that does not explicitly support function overloading.
all of the error checks that were touched by the working code were testing in an intermediate state. Many of the error checks were tested in the final state. Error checks that were not touched by the working program (inactive code) were not tested.
currently, error_flag is returned to a calling program (in the form of bmi_status). error_string is available in the ErrorType object if, in the future, the calling program requests the error message in the subroutine signature.

return and bmi_failure flags for energymodule error

f939e07

PhilMiller reviewed Apr 30, 2024

View reviewed changes

drakest123 added 3 commits May 13, 2024 12:08

Changed STOP to RETURN with signature changes up call stack

be6a639

Replaced STOP and sys_abort() with RETURN up call stack

0f4dc8c

Commented out unused SUBROUTINE parse() to eliminate stop statement

989ba81

SnowHydrology marked this pull request as draft May 14, 2024 19:10

GreyREvenson reviewed May 22, 2024

View reviewed changes

GreyREvenson mentioned this pull request May 22, 2024

Add check for consistency in land cover and soil #100

Merged

drakest123 added 2 commits June 11, 2024 11:08

initial changes removing error_flag from call signatures

4c8a086

Introduce ErrorType to return error to calling program

5fcfeed

drakest123 marked this pull request as ready for review June 15, 2024 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `stop` with `return` and provide and `bmi_failure` flags for EnergyModule error #108

Replace `stop` with `return` and provide and `bmi_failure` flags for EnergyModule error #108

SnowHydrology commented Apr 30, 2024 •

edited

Loading

PhilMiller left a comment

drakest123 commented May 7, 2024

GreyREvenson commented May 22, 2024 •

edited

Loading

GreyREvenson May 22, 2024

drakest123 May 24, 2024

GreyREvenson May 22, 2024 •

edited

Loading

drakest123 May 24, 2024

GreyREvenson May 22, 2024

GreyREvenson May 22, 2024 •

edited

Loading

PhilMiller May 23, 2024

drakest123 May 24, 2024

drakest123 commented Jun 15, 2024 •

edited

Loading

Replace stop with return and provide and bmi_failure flags for EnergyModule error #108

Are you sure you want to change the base?

Replace stop with return and provide and bmi_failure flags for EnergyModule error #108

Conversation

SnowHydrology commented Apr 30, 2024 • edited Loading

Changes:

PhilMiller left a comment

Choose a reason for hiding this comment

drakest123 commented May 7, 2024

GreyREvenson commented May 22, 2024 • edited Loading

GreyREvenson May 22, 2024

Choose a reason for hiding this comment

drakest123 May 24, 2024

Choose a reason for hiding this comment

GreyREvenson May 22, 2024 • edited Loading

Choose a reason for hiding this comment

drakest123 May 24, 2024

Choose a reason for hiding this comment

GreyREvenson May 22, 2024

Choose a reason for hiding this comment

GreyREvenson May 22, 2024 • edited Loading

Choose a reason for hiding this comment

PhilMiller May 23, 2024

Choose a reason for hiding this comment

drakest123 May 24, 2024

Choose a reason for hiding this comment

drakest123 commented Jun 15, 2024 • edited Loading

Replace `stop` with `return` and provide and `bmi_failure` flags for EnergyModule error #108

Replace `stop` with `return` and provide and `bmi_failure` flags for EnergyModule error #108

SnowHydrology commented Apr 30, 2024 •

edited

Loading

GreyREvenson commented May 22, 2024 •

edited

Loading

GreyREvenson May 22, 2024 •

edited

Loading

GreyREvenson May 22, 2024 •

edited

Loading

drakest123 commented Jun 15, 2024 •

edited

Loading